A Fast Two-Stage Algorithm for Computing SimRank and Its Extensions

نویسندگان

  • Xu Jia
  • Hongyan Liu
  • Li Zou
  • Jun He
  • Xiaoyong Du
چکیده

We present a fast two-stage algorithm for computing the PageRank vector [16]. The algorithm exploits the following observation: the homogeneous discrete-time Markov chain associated with PageRank is lumpable, with the lumpable subset of nodes being the dangling nodes [13]. Time to convergence is only a fraction of what’s required for the standard algorithm employed by Google [16]. On data of 451,237 webpages, convergence was achieved in 20% of the time. Our algorithm also replaces a common practice which is in general incorrect. Namely, the practice of ignoring the dangling nodes until the last stages of computation [16] does not necessarily accelerate convergence. In comparison, our algorithm is provable, generally applicable, and achieves the desired speedup. The paper ends with a discussion of possible extensions that generalize the divide-and-conquer theme. We describe two variations that incorporate a multi-stage algorithm. In the first variation, the ordinary PageRank vector is computed. In the second variation, the algorithm computes a generalized version of PageRank where webpages are divided into several classes, each incorporating a different personalization vector. The latter represents a major modeling extension and introduces greater flexibility and a potentially more refined model for web traffic.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast top-k similarity join for SimRank

SimRank is a well-studied similarity measure between two nodes in a network. However, evaluating SimRank of all nodes in a network is not only time-consuming but also not pragmatic, since users are only interested in the most similar pairs in many real-world applications. This paper focuses on topk similarity join based on SimRank. In this work, we first present an incremental algorithm for com...

متن کامل

A Graph-Theoretic Algorithm for Automatic Extension of Translation Lexicons

This paper presents a graph-theoretic approach to the identification of yetunknown word translations. The proposed algorithm is based on the recursive SimRank algorithm and relies on the intuition that two words are similar if they establish similar grammatical relationships with similar other words. We also present a formulation of SimRank in matrix form and extensions for edge weights, edge l...

متن کامل

Detection of Image Pairs Using Co-saliency Model

In this paper a method is presented to identify co-attention objects from an image pair. This method provides an effective way to predict human fixations within multi-images, and robustly highlight co-salient regions. This method generates the SISM by computing three visual saliency maps within each image. For the MISM computation, a comultilayer graph is introduced using a spatial pyramid repr...

متن کامل

A New Two-stage Iterative Method for Linear Systems and Its Application in Solving Poisson's Equation

In the current study we investigate the two-stage iterative method for solving linear systems. Our new results shows which splitting generates convergence fast in iterative methods. Finally, we solve the Poisson-Block tridiagonal matrix from Poisson's equation which arises in mechanical engineering and theoretical physics. Numerical computations are presented based on a particular linear system...

متن کامل

An Efficient Similarity Search Framework for SimRank over Large Dynamic Graphs

SimRank is an important measure of vertex-pair similarity according to the structure of graphs. The similarity search based on SimRank is an important operation for identifying similar vertices in a graph and has been employed in many data analysis applications. Nowadays, graphs in the real world become much larger and more dynamic. The existing solutions for similarity search are expensive in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010